Rework Linker dispatching for cross-major nvJitLink/driver skew#1911
Open
cpcloud wants to merge 8 commits intoNVIDIA:mainfrom
Open
Rework Linker dispatching for cross-major nvJitLink/driver skew#1911cpcloud wants to merge 8 commits intoNVIDIA:mainfrom
cpcloud wants to merge 8 commits intoNVIDIA:mainfrom
Conversation
|
0c94703 to
5d8fa24
Compare
61ea4ff to
a259b8d
Compare
Replace the module-level "decide once, use everywhere" nvJitLink-vs-driver choice with a per-Linker-instance decision that considers the CUDA driver major version, nvJitLink's availability and major version, the input code types, and whether link-time optimization is requested. The dispatch is factored into a pure helper `_choose_backend()` that is fully unit-testable without a GPU. Its decision matrix: - no nvJitLink, no LTO -> driver - matching majors -> nvJitLink - cross-major, no LTO -> driver (nvJitLink output may not be loadable) - LTO + no nvJitLink -> RuntimeError - LTO + cross-major -> RuntimeError This resolves the cross-major-driver scenario described in NVIDIA#712, where an nvJitLink 12.x may produce a CUBIN the driver 13.x (or vice versa) cannot load. The previous code committed to nvJitLink unconditionally when it was importable. Tests: - `tests/test_linker_dispatch.py` parametrizes the entire matrix against `_choose_backend()` with mocked versions (no GPU, no driver required). - `tests/test_linker.py::TestLinkerDispatch` drives the same decision through the real `Linker` constructor via monkeypatched version probes. - `tests/test_optional_dependency_imports.py` is updated to exercise the new `_probe_nvjitlink()` helper in place of the removed `_decide_nvjitlink_or_driver()`. - `tests/test_program.py` and `tests/test_linker.py` use a small local helper to compute the effective backend for the current environment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
driver_version() was called unconditionally during Linker.__init__, which fails in environments where nvJitLink is installed but the CUDA driver is absent (e.g., build containers). Now catches the exception and sets driver_major=None. When driver_major is unknown and nvJitLink is available, optimistically selects the nvJitLink backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test helpers calling driver_version() at module scope would crash in no-driver environments before test collection. Mirror the production lazy-probe pattern: catch exceptions and pass None. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Linker_link was nulling self._drv_log_bufs right after cuLinkComplete, releasing the bytearrays whose addresses were handed to the driver via CU_JIT_INFO_LOG_BUFFER and CU_JIT_ERROR_LOG_BUFFER at cuLinkCreate time. The CUlinkState retains those pointers until cuLinkDestroy, which runs during Linker tp_dealloc. Freeing the bytearrays first left the driver with dangling pointers and corrupted the heap; subsequent CUDA calls (e.g. NVRTC compilation in the next test fixture) segfaulted. This path became reachable in CI with the new per-instance backend dispatch: CTK 12.9.1 + driver 13.0 runners now hit the driver linker for cross-major linking, which was never exercised before. Retain _drv_log_bufs until the cdef class is deallocated; pxd declaration order ensures _culink_handle (and therefore cuLinkDestroy) runs before the bytearrays are cleared.
…etime The CUDA driver docs state: "optionValues must remain valid for the life of the CUlinkState if output options are used." The driver writes log- fill sizes (output) back into the optionValues slots for CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES and CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES. Linker_init previously declared c_jit_keys/c_jit_values as local cdef vector[...] on the stack of Linker_init; they were destroyed when the function returned, leaving the driver with dangling writes during subsequent cuLinkAddData/cuLinkComplete/cuLinkDestroy calls. This was always latent. It became reachable with the per-instance backend dispatch (CTK 12.9.1 runners now select the driver linker when they pair with a driver 13 install), and only manifested on driver 13 as heap corruption that killed the next NVRTC or link call. Promote the two arrays to cdef class fields declared after _culink_handle in the pxd. Cython's tp_dealloc destroys C++ fields in pxd declaration order, so the vectors are destroyed after the shared_ptr deleter runs cuLinkDestroy. The cuda.bindings high-level wrapper (driver.cuLinkCreate) already handles this by attaching a keepalive to CUlinkState; cuda.core's low-level cydriver.cuLinkCreate path did not. Also drop the now-unused void_ptr ctypedef.
The as_bytes() method raises ValueError for unsupported backends (per its docstring and matching the test directly above this one). The driver-backend skip-guarded test was asserting RuntimeError, so it always failed on CTK 12.9.1 runners where the skip condition does not apply.
Adds parametrized cases for the build-container path where cuDriverGetVersion is unqueryable: with nvJitLink present the dispatcher picks nvjitlink optimistically; with nvJitLink absent it falls back to driver for non-LTO and raises for LTO. These paths are documented in _choose_backend's contract but were previously uncovered.
a259b8d to
53541a6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Linker-instance dispatch at__init__time_choose_backend()helper for GPU-free unit testingRuntimeErrorfor LTO when backends are incompatibledriver_version()lazily — environments with nvJitLink but no driver (build containers) still work_probe_nvjitlink()cached, warns at most once when nvJitLink is absentBreaking change:
options.link_time_optimization=Truewith nvJitLink absent now raisesRuntimeErrorinstead of silently passingCU_JIT_LTOto the driver (which was not real LTO linking).Decision matrix
Test plan
test_linker_dispatch.py)Closes #712
🤖 Generated with Claude Code